[New Workflow] AMR-search for neisseria gonorrhoeae samples #743

fraser-combe · 2025-02-03T17:15:59Z

🗑️ This dev branch should be deleted after merging to main.

🧠 Summary

This PR creates a standalone workflow for PathogenWatch AMR-search in order to utilize the functionality of its AMR resistance profiling steps. This is processed by an integrated Python script parse_amr_json to extract relevant data into a CSV file and PNG summary table for visualization. This workflow will likely be integrated into TheiaProk later down the road.

Documentation for this new workflow has been created.

⚡ Impacted Workflows/Tasks

This PR may lead to different results in pre-existing outputs: No

This PR uses an element that could cause duplicate runs to have different results: No

🛠️ Changes

Implementation of wf_amr_search.wdl along with task_amr_search.wdl
This includes the building of a new docker container with PathogenWatch AMR-search and PAARSNP installed.

⚙️ Algorithm

Using a microbial FASTA file, PAARSNP is run and generates a JSON file containing AMR profiling information. This JSON is then passed to a python script parse_amr_json.py which is housed within the docker container us-docker.pkg.dev/general-theiagen/theiagen/amrsearch:0.2.0. This script then parses the information within the JSON and creates a CSV and PNG that resemble the output given from Pathogenwatch's AMR profile.

➡️ Inputs

input_fasta -> a microbial FASTA file
samplename -> Name which the user wants prefixed to output files
amr_search_database -> The NCBI taxon code that is used by PAARSNP to pull the correct .toml file from the amr-libraries stored within the docker container.

⬅️ Outputs

amr_search_results -> JSON output of AMR profiling information
amr_results_csv -> CSV format of AMR profiling information
amr_results_png -> PNG format of AMR profiling information, resembling the Pathogenwatch PDF output

🧪 Testing

Verified that AMR results are correctly parsed and formatted in JSON, CSV, and PNG outputs locally and in Terra
Verified that AMR results differ minimally or are identical to Pathogenwatch outputs. "Differ minimally" due to Pathogenwatch utilizing AMRsearch libraries v0.0.17 and we are using v0.0.20. Testing indicates minimal to no differences in outputs.

Initial Terra Test
Test Containing All Species
E. coli were not included in this test as there were no publicly available examples available in Pathogenwatch. PAARSNP/AMRSearch was not run prior to a certain date.
GCA_011383385_typhi: Newer database of 0.0.20 has additional sul2 predicted
GCA_042331435_GC: Newer database of 0.0.20 has additional tetM and rpsj_V57M predicted. Inferred resistance of tetracycline from intermediate to resistant.

Suggested Scenarios for Reviewer to Test

wf_amr_search.wdl provides the correct outputs, PNG, JSON, and CSV.
If Pathogenwatch was being used previously, run against existing results.

🔬 Final Developer Checklist

The workflow/task has been tested and results, including file contents, are as anticipated
The CI/CD has been adjusted and tests are passing (Theiagen developers)
Code changes follow the style guide
Documentation and/or workflow diagrams have been updated if applicable
- You have updated the "Last Known Changes" field for any affected workflows in the respective workflow documentation page and for every entry in the three workflows_overview tables to be the tag for the next upcoming release. If you do not know the tag, please put "vX.X.X"

🎯 Reviewer Checklist

All changed results have been confirmed
You have tested the PR appropriately (see the testing guide for more information)
All code adheres to the style guide
MD5 sums have been updated
The PR author has addressed all comments
The documentation has been updated

…workflow

…t for later integration

…made input_base more robust

…mage

AndrewLangvt

Some initial changes here @awh082834. Mostly minor documentation/namespace polishing. Well done with the overarching schema. We'll see what the UAT brings about for feedback.

AndrewLangvt · 2025-02-19T13:50:34Z

workflows/utilities/wf_amr_search.wdl

+workflow amr_search_workflow {
+  input {
+    File input_fasta
+    String amr_search_database


Please consider changing this as the user doesn't need to pass in a DB, only the taxon code that then references the correct DB to use. Taxon/taxon_of_interest/taxon_code might make more sense. Some of this will depend on how we do the mapping when we plug into TheiaProk, but just dropping this as a note for future work when we implement that integration upstream.

AndrewLangvt · 2025-02-19T13:56:43Z

docs/workflows/standalone/amr_search.md

+
+## AMR_Search_PHB
+
+The AMR_Search workflow is a standalone version of Pathogenwatchs AMR profiling functionality utilizing `AMRsearch` tool from Pathogenwatch.


Pathogenwatch's

AndrewLangvt · 2025-02-19T14:16:52Z

docs/workflows/standalone/amr_search.md

+| amr_search | **disk_size** | Integer | Amount of storage (in GB) to allocate to the task |50| Optional |
+| amr_search | **docker** | String | The docker container to use for the task |us-docker.pkg.dev/general-theiagen/theiagen/amrsearch:0.2.0| Optional |
+| amr_search | **memory** | Integer | Amount of memory/RAM (in GB) to allocate to the task |8| Optional |
+| amr_search_workflow | **amr_search_database** | String | NCBI taxon code of samples known taxonomy, see above supported species || Required |


move up so required inputs are listed first

AndrewLangvt · 2025-02-19T14:19:49Z

docs/workflows/genomic_characterization/theiaprok.md

+
+        This task performs *in silico* antimicrobial resistance (AMR) profiling for *Neisseria gonorrhoeae* using **AMRsearch**, the primary tool used by [Pathogenwatch](https://pathogen.watch/) to genotype and infer antimicrobial resistance (AMR) phenotypes from assembled microbial genomes.
+
+        **AMRsearch** screens against an in-house library of curated genotypes and inferred phenotypes, developed in collaboration with community experts. Resistance phenotypes are determined based on both **resistance genes** and **mutations**, and the system accounts for interactions between multiple SNPs, genes, and suppressors. Predictions follow **S/I/R classification** (*Sensitive, Intermediate, Resistant*).


This isn't Theiagen's "in-house library" so we probably want to adjust this. I am guessing this is a copy/paste from PW description of AMR_search (totally fine), but we'll want to have it reflect Theiagen, not PW. I.e. "Screens against Pathogenwatch's library of curated genotypes..."

Ah yes this was a holdover from original documentation that I decided to keep. In context it definitely sounds like its our library. Ill get this changed!

AndrewLangvt · 2025-02-19T14:20:29Z

docs/workflows/genomic_characterization/theiaprok.md

+            | Software Documentation | [Pathogenwatch](https://cgps.gitbook.io/pathogenwatch) |
+            | Original Publication(s) | [PAARSNP: *rapid genotypic resistance prediction for *Neisseria gonorrhoeae*](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7545138/) |
+
+        !!! techdetails "`parse_amr_json.wdl` Details"


AndrewLangvt · 2025-02-19T14:21:26Z

docs/workflows/standalone/amr_search.md

+| amr_search_docker | String | Docker image used to run AMR_Search |
+| amr_search_version | String | Version of AMR_Search libraries used |
+
+## References (if applicable)


remove "(if applicable)"

AndrewLangvt · 2025-02-19T14:23:00Z

tasks/gene_typing/drug_resistance/task_amr_search.wdl

+  }
+  command <<< 
+    # Extract base name without path or extension
+    # Added suffix strip to handle cases of differing FASTA extensions. Was hard coded to .fasta


"Strip suffix to handle cases of differing FASTA extension."

we need not describe what was added/what previously existed, only comment the existing code for functional understanding/comprehension

AndrewLangvt · 2025-02-19T14:23:39Z

tasks/gene_typing/drug_resistance/task_amr_search.wdl

+    # Move the output file from the input directory to the working directory
+    mv $(dirname ~{input_fasta})/${input_base}_paarsnp.jsn ./~{samplename}_paarsnp_results.jsn
+
+    python3 /scripts/parse_amr_json.py \


maybe worth a comment here with a link to the location of this script, for posterity's sake

Sounds good! Ill put the link to the current dev branch of the docker builds repo and will update it when it gets merged.

fraser-combe and others added 13 commits January 26, 2025 20:44

amrsearch wdl and json parser

8dddc4c

Updated AMR search workflow into Merlin_Magic

aaa29d1

update mm

b7e015e

update NC output amrsearch

84b4e35

update docs amr search in GC section

fe1fef2

add MM files

6e64915

rename database input parameter to amr_search_database in amr_search_…

da52a60

…workflow

remove integration with TheiaProk and merlin_magic; just commented ou…

26fd16e

…t for later integration

condense into one task; updated docker image, added python to image; …

b2c581d

…made input_base more robust

Dockstore yml amrsearch inclusion

bf73aba

non optional database input

d73b845

Documentation additions for standalone AMR_Search

6ebbea4

removal of parse_amr_json; no longer needed as script is within the i…

72b313a

…mage

AndrewLangvt reviewed Feb 19, 2025

View reviewed changes

minor documentation changes

d9a6f10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New Workflow] AMR-search for neisseria gonorrhoeae samples #743

[New Workflow] AMR-search for neisseria gonorrhoeae samples #743

fraser-combe commented Feb 3, 2025 •

edited by awh082834

Loading

AndrewLangvt left a comment

AndrewLangvt Feb 19, 2025

AndrewLangvt Feb 19, 2025

AndrewLangvt Feb 19, 2025

AndrewLangvt Feb 19, 2025

awh082834 Feb 19, 2025

AndrewLangvt Feb 19, 2025

AndrewLangvt Feb 19, 2025

AndrewLangvt Feb 19, 2025

AndrewLangvt Feb 19, 2025

awh082834 Feb 19, 2025


		## AMR_Search_PHB

		The AMR_Search workflow is a standalone version of Pathogenwatchs AMR profiling functionality utilizing `AMRsearch` tool from Pathogenwatch.


		This task performs in silico antimicrobial resistance (AMR) profiling for Neisseria gonorrhoeae using AMRsearch, the primary tool used by [Pathogenwatch](https://pathogen.watch/) to genotype and infer antimicrobial resistance (AMR) phenotypes from assembled microbial genomes.

		AMRsearch screens against an in-house library of curated genotypes and inferred phenotypes, developed in collaboration with community experts. Resistance phenotypes are determined based on both resistance genes and mutations, and the system accounts for interactions between multiple SNPs, genes, and suppressors. Predictions follow S/I/R classification (Sensitive, Intermediate, Resistant).

[New Workflow] AMR-search for neisseria gonorrhoeae samples #743

Are you sure you want to change the base?

[New Workflow] AMR-search for neisseria gonorrhoeae samples #743

Conversation

fraser-combe commented Feb 3, 2025 • edited by awh082834 Loading

🧠 Summary

⚡ Impacted Workflows/Tasks

🛠️ Changes

⚙️ Algorithm

➡️ Inputs

⬅️ Outputs

🧪 Testing

Suggested Scenarios for Reviewer to Test

🔬 Final Developer Checklist

🎯 Reviewer Checklist

AndrewLangvt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fraser-combe commented Feb 3, 2025 •

edited by awh082834

Loading